---
title: "Semantic Agent"
description: "Enhance the PandasAI library with the Semantic Agent for more accurate and interpretable results."
---

## Introduction to the Semantic Agent

The `SemanticAgent` (currently in beta) extends the capabilities of the PandasAI library by adding a semantic layer to its results. Unlike the standard `Agent`, the `SemanticAgent` generates a JSON query, which can then be used to produce Python or SQL code. This approach ensures more accurate and interpretable outputs.

> **Note:** Usage of the Semantic Agent in production is subject to a license. For more details, refer to the [license documentation](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE).
> If you plan to use it in production, [contact us](https://forms.gle/JEUqkwuTqFZjhP7h8).

## Instantiating the Semantic Agent

Creating an instance of the `SemanticAgent` is similar to creating an instance of an `Agent`.

```python
from pandasai.ee.agents.semantic_agent import SemanticAgent
import pandas as pd

df = pd.read_csv('revenue.csv')

agent = SemanticAgent(df, config=config)
agent.chat("What are the top 5 revenue streams?")
```

## How the Semantic Agent Works

The Semantic Agent operates in two main steps:

1. Schema generation
2. JSON query generation

### Schema Generation

The first step is schema generation, which structures the data into a schema that the Semantic Agent can use to generate JSON queries. By default, this schema is automatically created, but you can also provide a custom schema if necessary.

#### Automatic Schema Generation

By default, the `SemanticAgent` considers all dataframes passed to it and generates an appropriate schema.

#### Custom Schema

To provide a custom schema, pass a `schema` parameter during the instantiation of the `SemanticAgent`.

```python
salaries_df = pd.DataFrame(
    {
        "EmployeeID": [1, 2, 3, 4, 5],
        "Salary": [5000, 6000, 4500, 7000, 5500],
    }
)

employees_df = pd.DataFrame(
    {
        "EmployeeID": [1, 2, 3, 4, 5],
        "Name": ["John", "Emma", "Liam", "Olivia", "William"],
        "Department": ["HR", "Marketing", "IT", "Marketing", "Finance"],
    }
)

schema = [
    {
        "name": "Employees",
        "table": "Employees",
        "measures": [
            {
                "name": "count",
                "type": "count",
                "sql": "EmployeeID"
            }
        ],
        "dimensions": [
            {
                "name": "EmployeeID",
                "type": "string",
                "sql": "EmployeeID"
            },
            {
                "name": "Department",
                "type": "string",
                "sql": "Department"
            }
        ],
        "joins": [
            {
                "name": "Salaries",
                "join_type":"left",
                "sql": "Employees.EmployeeID = Salaries.EmployeeID"
            }
        ]
    },
    {
        "name": "Salaries",
        "table": "Salaries",
        "measures": [
            {
                "name": "count",
                "type": "count",
                "sql": "EmployeeID"
            },
            {
                "name": "avg_salary",
                "type": "avg",
                "sql": "Salary"
            },
            {
                "name": "max_salary",
                "type": "max",
                "sql": "Salary"
            }
        ],
        "dimensions": [
            {
                "name": "EmployeeID",
                "type": "string",
                "sql": "EmployeeID"
            },
            {
                "name": "Salary",
                "type": "string",
                "sql": "Salary"
            }
        ],
        "joins": [
            {
                "name": "Employees",
                "join_type":"left",
                "sql": "Contracts.contract_code = Fees.contract_id"
            }
        ]
    }
]

agent = SemanticAgent([employees_df, salaries_df], schema=schema)
```

### JSON Query Generation

The second step involves generating a JSON query based on the schema. This query is then used to produce the Python or SQL code required for execution.

#### Example JSON Query

Here's an example of a JSON query generated by the `SemanticAgent`:

```json
{
  "type": "number",
  "dimensions": [],
  "measures": ["Salaries.avg_salary"],
  "timeDimensions": [],
  "filters": [],
  "order": []
}
```

This query is interpreted by the Semantic Agent and converted into executable Python or SQL code.

## Deep Dive into the Schema and the Query

### Understanding the Schema Structure

A schema in the `SemanticAgent` is a comprehensive representation of the data, including tables, columns, measures, dimensions, and relationships between tables. Here's a breakdown of its components:

#### Measures

Measures are the quantitative metrics used in the analysis, such as sums, averages, counts, etc.

- **name**: The identifier for the measure.
- **type**: The type of aggregation (e.g., `count`, `avg`, `sum`, `max`, `min`).
- **sql**: The column or expression in SQL to compute the measure.

Example:

```json
{
  "name": "avg_salary",
  "type": "avg",
  "sql": "Salary"
}
```

#### Dimensions

Dimensions are the categorical variables used to slice and dice the data.

- **name**: The identifier for the dimension.
- **type**: The data type (e.g., string, date).
- **sql**: The column or expression in SQL to reference the dimension.

Example:

```json
{
  "name": "Department",
  "type": "string",
  "sql": "Department"
}
```

#### Joins

Joins define the relationships between tables, specifying how they should be connected in queries.

- **name**: The name of the related table.
- **join_type**: The type of join (e.g., `left`, `right`, `inner`).
- **sql**: The SQL expression to perform the join.

Example:

```json
{
  "name": "Salaries",
  "join_type": "left",
  "sql": "Employees.EmployeeID = Salaries.EmployeeID"
}
```

### Understanding the Query Structure

The JSON query is a structured representation of the request, specifying what data to retrieve and how to process it. Here's a detailed look at its fields:

#### Type

The type of query determines the format of the result, such as a single number, a table, or a chart.

- **type**: Can be "number", "pie", "bar", "line".

Example:

```json
{
  "type": "number",
  ...
}
```

#### Dimensions

Columns used to group the data. In an SQL `GROUP BY` clause, these would be the columns listed.

- **dimensions**: An array of dimension identifiers.

Example:

```json
{
  ...,
  "dimensions": ["Department"]
}
```

#### Measures

Columns used to calculate data, typically involving aggregate functions like sum, average, count, etc.

- **measures**: An array of measure identifiers.

Example:

```json
{
  ...,
  "measures": ["Salaries.avg_salary"]
}
```

#### Time Dimensions

Columns used to group the data by time, often involving date functions. Each `timeDimensions` entry specifies a time period and its granularity. The `dateRange` field allows various formats, including specific dates such as `["2022-01-01", "2023-03-31"]`, relative periods like "last week", "last month", "this month", "this week", "today", "this year", and "last year".

Example:

```json
{
  ...,
  "timeDimensions": [
    {
      "dimension": "Sales.time_period",
      "dateRange": ["2023-01-01", "2023-03-31"],
      "granularity": "day"
    }
  ]
}
```

#### Filters

Conditions to filter the data, equivalent to SQL `WHERE` clauses. Each filter specifies a member, an operator, and a set of values. The operators allowed include: "equals", "notEquals", "contains", "notContains", "startsWith", "endsWith", "gt" (greater than), "gte" (greater than or equal to), "lt" (less than), "lte" (less than or equal to), "set", "notSet", "inDateRange", "notInDateRange", "beforeDate", and "afterDate".

- **filters**: An array of filter conditions.

Example:

```json
{
  ...,
  "filters": [
    {
      "member": "Ticket.category",
      "operator": "notEquals",
      "values": ["null"]
    }
  ]
}
```

#### Order

Columns used to order the data, equivalent to SQL `ORDER BY` clauses. Each entry in the `order` array specifies an identifier and the direction of sorting. The direction can be either "asc" for ascending or "desc" for descending order.

- **order**: An array of ordering specifications.

Example:

```json
{
  ...,
  "order": [
    {
      "id": "Contratti.contract_count",
      "direction": "asc"
    }
  ]
}
```

### Combining the Components

When these components come together, they form a complete query that the Semantic Agent can interpret and execute. Here's an example that combines all elements:

```json
{
  "type": "table",
  "dimensions": ["Department"],
  "measures": ["Salaries.avg_salary"],
  "timeDimensions": [],
  "filters": [
    {
      "member": "Department",
      "operator": "equals",
      "values": ["Marketing", "IT"]
    }
  ],
  "order": [
    {
      "measure": "Salaries.avg_salary",
      "direction": "desc"
    }
  ]
}
```

This query translates to an SQL statement like:

```sql
SELECT Department, AVG(Salary) AS avg_salary,
FROM Employees
JOIN Salaries ON Employees.EmployeeID = Salaries.EmployeeID
WHERE Department IN ('Marketing', 'IT')
GROUP BY Department
ORDER BY avg_salary DESC;
```
